1 Introduction

1.1 Motivation

The spatialHeatmap package provides functionalities for visualizing cell-, tissue- and organ-specific data of biological assays by coloring the corresponding spatial features defined in anatomical images according to a numeric color key. The color scheme used to represent the assay values can be customized by the user. This core functionality is called a spatial heatmap plot. It is enhanced with nearest neighbor visualization tools for groups of measured items (e.g. gene modules) sharing related abundance profiles, including matrix heatmaps combined with hierarchical clustering dendrograms and network representations. The functionalities of spatialHeatmap can be used either in a command-driven mode from within R or a graphical user interface (GUI) provided by a Shiny App that is also part of this package. While the R-based mode provides flexibility to customize and automate analysis routines, the Shiny App includes a variety of convenience features that will appeal to many biologists. Moreover, the Shiny App has been designed to work on both local computers as well as server-based deployments (e.g. cloud-based or custom servers) that can be accessed remotely as a centralized web service for using spatialHeatmap's functionalities with community and/or private data. The functionality overview of this package is illustrated on Figure 1.

Functionality overview. The numeric data can come as a vector, data frame, or SummarizedExperiment (SE). If vector or data frame, the sample and condition identifiers should be in the form of 'sample__condition', e.g. 'S1__con1'. If data frame or SE, the columns and rows should be sample/conditions and features (gene1, gene2) respectively. In SE, the colData slot is required and contains replicate information, while the rowData slot is optional and contains row feature annotation. If the latter is available, the annotation is seen by mousing over a node in the interactive network. In the aSVG image (see [aSVG](#term) below), features are pre-defined and assigned unique identifiers. In visualising process, only features with exactly same identifiers between the data and aSVG image are recognised and coloured (e.g. S1) in spatial heatmaps. To supplement spatial heatmaps, coexpression analysis is applied on 'data matrix' to identify network modules. The gene in spatial heatmaps can be investigated in the gene module it belongs to, where the module is in form of matrix heatmap and network. Lastly, the spatial heatmaps, matrix heatmap, network are all combined as an interactive Shiny app.

Figure 1: Functionality overview
The numeric data can come as a vector, data frame, or SummarizedExperiment (SE). If vector or data frame, the sample and condition identifiers should be in the form of ’sample__condition’, e.g. ’S1__con1’. If data frame or SE, the columns and rows should be sample/conditions and features (gene1, gene2) respectively. In SE, the colData slot is required and contains replicate information, while the rowData slot is optional and contains row feature annotation. If the latter is available, the annotation is seen by mousing over a node in the interactive network. In the aSVG image (see aSVG below), features are pre-defined and assigned unique identifiers. In visualising process, only features with exactly same identifiers between the data and aSVG image are recognised and coloured (e.g. S1) in spatial heatmaps. To supplement spatial heatmaps, coexpression analysis is applied on ‘data matrix’ to identify network modules. The gene in spatial heatmaps can be investigated in the gene module it belongs to, where the module is in form of matrix heatmap and network. Lastly, the spatial heatmaps, matrix heatmap, network are all combined as an interactive Shiny app.

[ThG-Comment 1: Please add here an overview illustration (“visual abstract”) of the entire package that outlines all its core functionalities in one figure. Similar to Fig 1 and Fig 2 in the signatureSearch vignette here. In your case, the package design illustration (from Fig 2) could go into the same figure. To keep it simple and reusable, please create this illustration in a Google Drawing that you share with me. This way we can jointly edit it, and export to SVG or PNG as needed. Please take your time to create a quality illustration for this purpose that both looks professional and is informative too. The intent of this illustration is to use it in various places, e.g. slide shows, posters, vignette and paper.] Added
In the following all blue text are changes from jianhai.

As anatomical images the package supports both tissue maps from public repositories and custom images provided by the user. In general any type of image can be used as long as it can be provided in SVG (Scalable Vector Graphics) format, where the corresponding spatial features have been defined (see aSVG below). The numeric values plotted onto a spatial heatmap are usually quantitative measurements from a wide range of profiling technologies, such as microarrays, next generation sequencing (e.g. RNA-Seq and scRNA-Seq), proteomics, metabolomics, or many other small- or large-scale experiments. For convenience, several preprocessing and normalization methods for the most common use cases are included that support raw and/or preprocessed data. Currently, the main application domains of the spatialHeatmap package are numeric data sets and spatially mapped images from biological and biomedical areas. Moreover, the package has been designed to also work with many other spatial data types, such a population data plotted onto geographic maps. This high level of flexibility is one of the unique features of spatialHeatmap. Related software tools for biological applications in this field are largely based on pure web applications (Winter et al. 2007; Waese et al. 2017) or local tools (Maag 2018; Muschelli, Sweeney, and Crainiceanu 2014) that typically lack customization functionalities. These restrictions limit users to utilizing pre-existing expression data and/or fixed sets of anatomical image collections. To close this gap for biological use cases, we have developed spatialHeatmap as a generic R/Bioconductor package for plotting quantitative values onto any type of spatially mapped images in a programmable environment and/or in an intuitive to use GUI application.

1.2 Design

The core feature of spatialHeatmap is to map the assay values (e.g. gene expression data) of one or many items (e.g. genes) measured under different conditions in form of numerically graded colors onto the corresponding cell types or tissues represented in a chosen SVG image. In the gene profiling field, this feature supports comparisons of the expression values among multiple genes by plotting their spatial heatmaps next to each other. Similarly, one can display the expression values of a single or multiple genes across multiple conditions in the same plot (Figure 4). This level of flexibility is very efficient for visualizing complicated expression patterns across genes, cell types and conditions. In case of more complex anatomical images composed of overlapping multiple layer tissues, it is important to visually expose the tissue layer of interest in the plots. To address this, several default and customizable layer viewing options are provided. They allow to hide features in the top layers by making them transparent in order to expose features below them. This transparency viewing feature is highlighted below in the mouse example (Figure 6).

To maximize reusability and extensibility, the package organizes large-scale omics assay data along with the associated experimental design information in a SummarizedExperiment object. The latter is one of the core S4 classes within the Bioconductor ecosystem that has been widely adapted by many other software packages dealing with gene-, protein- and metabolite-level profiling data (Morgan et al. 2018). In case of gene expression data, the assays slot of the SummarizedExperiment container is populated with a gene expression matrix, where the rows and columns represent the genes and tissue/conditions, respectively, while the colData slot contains replicate information. The tissues and/or cell type information in the object maps via colData to the corresponding features in the SVG images using unique identifiers for the spatial features (e.g. tissues or cell types). This allows to color the features of interest in an SVG image according to the numeric data stored in a SummarizedExperiment object. In cases of simple assays, there is no need to use the advanced SummarizedExperiment, and the data can come as a vector or data frame. The former applies to several numeric values measured from a single gene, protein, metabolite, etc, while the latter applies to several samples and/or condidions, e.g. two tissues under two conditions. Details about how to access the SVG images and properly format the associated expression data are provided in the Supplement section of this vignette.

1.3 Terminology

Spatial heatmaps are images where colors encode numeric values in features of any shape. For plotting spatial heatmaps, Scalable Vector Graphics (SVG) has been chosen as image format since it is a flexible and widely adapted vector graphics format that provides many advantages for computationally embedding numerical and other information in images. SVG is based on XML formatted text describing all components present in images, including lines, shapes and colors. In case of biological images suitable for spatial heatmaps, the shapes often represent anatomical or cell structures. To assign colors to specific features in spatial heatmaps, annotated SVG (aSVG) files are used where the shapes of interest are labeled according to certain conventions so that they can be addressed and colored programmatically. SVGs and aSVGs of anatomical structures can be downloaded from many sources including the repositories described below. Alternatively, users can generate them themselves with vector graphics software. Inkscape is highly recommended, since other software could introduce potential issues due to default parameters. Typically, in aSVGs one or more shapes of a feature of interest, such as the cell shapes of an organ, are grouped together by a common feature identifier. Via these group identifiers one or many feature types can be colored simultaneously in an aSVG according to biological experiments assaying the corresponding feature types with the required spatial resolution. It is very important that correct assignment of image features and assay results is assured by using for both the same feature identifiers. The color gradient used to visually represent the numeric assay values is controlled by a color gradient parameter. To visually interpret the meaning of the colors, the corresponding color key is included in the spatial heatmap plots. The formatting details for properly annotating both aSVG images and assay data are provided in the Supplement section of this vignette.

1.4 Data Repositories

If not generated by the user, spatial heatmaps can be generated with data downloaded from various public repositories. This includes gene, protein and metabolic profiling data from databases, such as GEO, BAR and EBI. A particularly useful resource, when working with spatialHeatmap, is the Expression Atlas from EMBL-EBI (Papatheodorou et al. 2018). This online service contains both assay and anatomical images. Its assays data include mRNA and protein profiling experiments for different species, tissues and conditions. The corresponding anatomical image collections are also provided for a wide range of species including animals and plants. In spatialHeatmap several import functions are provided to work with the expression and aSVG repository from the Expression Atlas directly. The aSVG images developed by this project will be deposited in this repository, and users are encouraged to contribute aSVG images to the same repository by following their authoring guidlines.

1.5 Tutorial Overview

The following sections of this vignette showcase the most important functionalities of the spatialHeatmap package using as first example a simple to understand toy data set (jianhai: is the phrase order mixed?) , and then more complex mRNA profiling data from the Expression Atlas. First, spatial heatmap plots are generated for both the toy and mRNA expression data. The latter include gene expression data sets from RNA-Seq and microarray experiments of Human Brain, Mouse Organs, Chicken Organs, and Arabidopsis Shoots. The first three are RNA-Seq data from the Expression Atlas and the last one is a microarray data set from GEO. Second, gene context visualization features are introduced, which facilitate the visualization of gene modules sharing similar expression patterns. This includes the visualization of hierarchical clustering results with traditional matrix heatmaps (Matrix Heatmap) as well co-expression network plots (Network). Third, an overview of the corresponding Shiny App is presented that provides access to the same functionalities as the R functions, but executes them in an interactive environment (Chang et al., n.d.; Chang and Borges Ribeiro 2018). Fourth, more advanced features for plotting customized spatial heatmaps are introduced.

2 Getting Started

2.1 Installation

The spatialHeatmap package should be installed from an R (version \(\ge\) 3.6) session with the BiocManager::install command.

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("spatialHeatmap")

2.2 Load Packages and Documentation

Next, the packages required for running the sample code in this vignette need to be loaded.

library(spatialHeatmap); library(SummarizedExperiment); library(ExpressionAtlas); library(GEOquery)

The following lists the vignette(s) of this package in an HTML browser. Clicking the corresponding name will open this vignette.

browseVignettes('spatialHeatmap')

3 Spatial Heatmaps

[ThG-Comment 2: how can the user find out what features are defined (mapped) in the existing SVG images provided by your SVG collection, or the one from Expression Atlas? A very common use case would be that the user has generated some cell/tissue specific expression data, and then wants to map it to an existing mapped SVG. How can they find out what SVGs contain the features they have data for, so that they can properly set up their expression data for plotting and choose the appropriate SVG images rather than generating their own. Shouldn’t each SVG be accompanied with a feature table, or an R function provided that returns this info from a mapped SVG by parsing it? This would be an obvious utility reviewers and user will ask for in a package that is supposed to provide a high level of flexibilty and reusability, and I don’t think it would be hard to address this important need.]

jianhai: return_feature and update_feature is designed to extraxt features and add custom features respectively. The snake case is used so as to keep consistent with all other functions.

3.1 Toy Example

Spatial heatmaps are plotted with the spatial_hm function. To provide a quick and transparent overview how these plots are generated, the following uses a generalized toy example where a vector of random numeric values is generated that are used to color the corresponding features in an aSVG image. The image chosen for this example is from the human brain (homo_sapiens.brain.svg), and has been included in this package for testing purposes . The path to the file of this image on a user's system, where spatialHeatmap is installed, can be obtained with the system.file function.

# Directory of the aSVG collection.
svg.dir <- system.file("extdata/shinyApp/example", package="spatialHeatmap")
# Path of the target aSVG image.
svg.hum <- system.file("extdata/shinyApp/example", 'homo_sapiens.brain.svg', package="spatialHeatmap")

The return_feature function is designed to query a collection of aSVG files in a directory with feature (‘lobe’) and species (‘homo sapiens’) keywords, and return matched features in a data frame.

feature.df <- return_feature(feature=c('lobe'), species=c('homo sapiens'), dir=svg.dir)
## Accessing features... 
## gallus_gallus.svg, homo_sapiens.brain.svg, mus_musculus.male.svg, organ_final.svg, root_cross_final.svg, root_roottip_final.svg, shoot_final.svg, shoot_root_final.svg, us_map_final.svg,
feature.df
##          feature             id                    SVG    parent
## 1 occipital_lobe occipital_lobe homo_sapiens.brain.svg container
## 2  parietal_lobe  parietal_lobe homo_sapiens.brain.svg container
## 3  temporal_lobe  temporal_lobe homo_sapiens.brain.svg container
##   index index1
## 1    12     12
## 2    13     13
## 3    28     28
fnames <- feature.df[, 1]

A small numeric toy vector is generated, where the data slot contains four numbers and its name slot is populated with the three feature names obtained from the aSVG image and “notMapped”. Note, the numbers are mapped to features via names matching among the numeric vector and the aSVG, respectively. Accordingly, only numbers and features with matching name counterparts can be colored in the aSVG image. Entries without name matches are indicated by a message printed to the R console, e.g. “notMapped”. This behavior can be turned off with verbose=FALSE in the corresponding function call. In addition, a summary of the numeric assay to feature mappings is stored in the result data frame returned by the spatial_hm function.

# Random values.
my_vec <- sample(1:100, length(unique(fnames))+1)
# Name the vector.
names(my_vec) <- c(unique(fnames), 'notMapped')
my_vec
## occipital_lobe  parietal_lobe  temporal_lobe      notMapped 
##             32             86             28             78

Next, the spatial heatmap is plotted with the spatial_hm function (Figure 2). Internally, the numbers in my_vec are translated to colors based on the color key assigned to the col.com argument, and then painted onto the corresponding features in the aSVG image (here path svg.hum). In addition, the ID argument is used to name the data, ncol sets the number of columns for the spatial heatmap, which does not include the legend plot, and height is the height of spatial heatmap relative to width. In Figure 2, only ‘occipital_lobe’, ‘parietal_lobe’, and ‘temporal_lobe’ are mapped, since they are the matching features between the data and aSVG.

shm.df <- spatial_hm(svg.path=svg.hum, data=my_vec, ID='toy', ncol=1, height=0.7, sub.title.size=20)
## Enrties not mapped: notMapped
Spatial heatmap on toy data. The middle plot is the spatial heatmap and the right is the legend.

Figure 2: Spatial heatmap on toy data
The middle plot is the spatial heatmap and the right is the legend.

The resulting spatial heatmaps (‘spatial_heatmap’) and mapped features (‘mapped_feature’) between data and aSVG are saved in a list.

# Spatial heatmaps and mapped features are stored in a list.
names(shm.df)
## [1] "spatial_heatmap" "mapped_feature"
# Mapped features
shm.df[['mapped_feature']]
##   rowID     featureSVG value
## 1   toy occipital_lobe    32
## 2   toy  parietal_lobe    86
## 3   toy  temporal_lobe    28

[ThG-Comment 3: Jianhai, please add a section that accomplishes what is described in the above text and code chunks. When done the corresponding functional code chunks should be functional and evaluated in the vignette. This will help to give the user a quick code-based intro into the core functionalities of the spatial_hm function without requiring too much background knowledge about expression data and other details. Note, this section overlaps with the comment #2, since it requires some basic functionality to extract feature names from pre-annotated SVGs.]

Added

3.2 Human Brain

This subsection introduces how to find cell- and tissue-specific assay data in the Expression Atlas database. After choosing a gene expression experiment, the data is downloaded directly into a user's R session and preprocessed. Subsequently, the processed expression values of genes selected by users are plotted onto a chosen aSVG image. In this case, the query and downloading functionalities of expression data are provided by functionalities of the ExpressionAtlas package (Keays 2019).

The following example searches the Expression Atlas for expression data derived from specific tissues and species of interest, here ‘cerebellum’ and ‘Homo sapiens’, respectively.

all.hum <- searchAtlasExperiments(properties="cerebellum", species="Homo sapiens")

The search result is stored in a DataFrame containing 13 accessions matching the above query. For the following sample code, the accession ‘E-GEOD-67196’ from Prudencio et al. (2015) has been chosen, which corresponds to an RNA-Seq profiling experiment of ‘cerebellum’ and ‘frontal cortex’ brain tissue from patients with amyotrophic lateral sclerosis (ALS). Details about the corresponding record can be returned as follows.

all.hum[2, ]
## DataFrame with 1 row and 4 columns
##      Accession      Species                  Type
##    <character>  <character>           <character>
## 1 E-GEOD-67196 Homo sapiens RNA-seq of coding RNA
##                                                                                                                                   Title
##                                                                                                                             <character>
## 1 Transcription profiling by high throughput sequencing of cerebellum and frontal cortex from patients of amyotrophic lateral sclerosis

The getAtlasData function allows to download the chosen RNA-Seq experiment from the Expression Atlas and import it into a RangedSummarizedExperiment of a user's R session.

Regarding the data type, if the data involves complex samples and conditions (mouse example, chicken example, Arabidopsis example), SummarizedExperiment or RangedSummarizedExperiment is highly recommended. Otherwise, if the data contains simple samples and conditions, it can come as a vector (toy example) or data frame. The function sapatial_hm will distinguish the data types internally. In this example, the default RangedSummarizedExperiment is used. The usage of vector and data frame is detailed in Supplement.

rse.hum <- getAtlasData('E-GEOD-67196')[[1]][[1]]

The design of the downloaded RNA-Seq experiment is described in the colData slot of rse.hum. The following returns only its first five rows and columns.

colData(rse.hum)[1:5, 1:5]
## DataFrame with 5 rows and 5 columns
##            AtlasAssayGroup     organism   individual
##                <character>  <character>  <character>
## SRR1927019              g1 Homo sapiens  individual1
## SRR1927020              g2 Homo sapiens  individual1
## SRR1927021              g1 Homo sapiens  individual2
## SRR1927022              g2 Homo sapiens  individual2
## SRR1927023              g1 Homo sapiens individual34
##             organism_part                       disease
##               <character>                   <character>
## SRR1927019     cerebellum amyotrophic lateral sclerosis
## SRR1927020 frontal cortex amyotrophic lateral sclerosis
## SRR1927021     cerebellum amyotrophic lateral sclerosis
## SRR1927022 frontal cortex amyotrophic lateral sclerosis
## SRR1927023     cerebellum amyotrophic lateral sclerosis

For downstream plotting purposes it can be desirable to shorten the text in certain columns of colData. This way one can use the source data for including ‘pretty’ sample names in columns and legends of all downstream tables and plots, respectively, in an automated and reproducible manner. To achieve this, the following example imports a ‘targets’ file that can be maintained by the user and is used to replace the text in the colData slot with the shortened version suitable for column titles and legends. This targets file utility is particularly useful for data sets requiring custom annotations.

Note, to plot spatial heatmaps, only the 2 columns of tissue and condition replicates are used. Additional details for generating and using targets files in spatialHeatmap are provided in the Supplement of this vignette.

jianhai: I added this paragraph, since the targte files can include many columns but only the 2 columns of tissue and condition are required, so when users make their own targets file they can save time by ignoring others. DataFrame should be used otherewise errors arise.

hum.tar <- system.file('extdata/shinyApp/example/target_human.txt', package='spatialHeatmap')
target.hum <- read.table(hum.tar, header=TRUE, row.names=1, sep='\t')

Use the tagets file to replace the data frame in colData slot.

colData(rse.hum) <- DataFrame(target.hum)

A slice of the simplified colData object is shown below, where the disease column contains now shorter labels than in the original data set.

colData(rse.hum)[c(1:3, 41:42), 4:5]
## DataFrame with 5 rows and 2 columns
##             organism_part     disease
##               <character> <character>
## SRR1927019     cerebellum         ALS
## SRR1927020 frontal_cortex         ALS
## SRR1927021     cerebellum         ALS
## SRR1927059     cerebellum      normal
## SRR1927060 frontal_cortex      normal

The actual expression data of the downloaded RNA-Seq experiment is stored in the assay slot of rse.hum. Since it contains raw count data, it is often beneficial to apply prior to plotting spatial heatmaps basic preprocessing routines. The following shows how to normalize the count data, aggregate replicates and then remove genes with unreliable expression responses.

[ThG-Comment 4: can you please substantially improve the below text in the following preprocessing code step as well as the help files of the corresponding functions. I am not able to follow why and how certain parts are done. For instance, why is ratio used as a normalization method? The final result also does not seem to be log ratios. You need to better describe what you are doing here. Asking readers to look through your code is not appropriate for a package.] Added

The raw-count normalizing function is norm_data. It builds on calcNormFactors (CNF) from edgeR (McCarthy et al. 2012), and estimateSizeFactors (EST), varianceStabilizingTransformation (VST), rlog from DESeq2 (Love, Huber, and Anders 2014). The argument norm.fun specifies one of the four internal normalizing functions: CNF, EST, VST, rlog. If none, no normalization is applied. The arguments of each internal normalizing function are provided through parameter.list, which is a named list. For example, norm.fun='ESF' and parameter.list=list(type='ratio') is equivalent to estimateSizeFactors(object, type='ratio').

If paramter.list=NULL, the default arguments are applied for the normalizing function provided to norm.fun. See the help file of norm_data for details by running ?norm_data in R console. In this example, ESF is chose for faster speed.

# Normalise.
se.nor.hum <- norm_data(data=rse.hum, norm.fun='ESF', data.trans='log2')
## Normalising: ESF 
##    type 
## "ratio"

The replicates used for aggregation is generated by concatenating the ‘sample’ and ‘contition’ column in colData slot with double underscore (__). The former is specified by sam.factor and the latter by con.factor. For example, in the following, organism_part is sample and disease is condition. Thus cerebellum__ALS, frontal_cortex__ALS, cerebellum__normal, frontal_cortex__normal are generated as ’sample__condition’ replicates and aggregated.

# Aggregate replicates.
se.aggr.hum <- aggr_rep(data=se.nor.hum, sam.factor='organism_part', con.factor='disease', aggr='mean')
assay(se.aggr.hum)[1:3, ]
##                 cerebellum__ALS frontal_cortex__ALS
## ENSG00000000003        7.024054            7.091484
## ENSG00000000005        0.000000            1.540214
## ENSG00000000419        7.866582            8.002549
##                 cerebellum__normal frontal_cortex__normal
## ENSG00000000003           6.406157               7.004446
## ENSG00000000005           0.000000               1.403110
## ENSG00000000419           8.073264               7.955709

The filtering removes unreliable expression measures. Specifically, the following example eliminates genes with expression values larger than 5 in at least 1% of all samples (pOA=c(0.01, 5)), and with coefficient of variance (CV) between 0.30 and 100 (CV=c(0.30, 100)) are retained.

# Filter genes with low variance and low counts.
se.fil.hum <- filter_data(data=se.aggr.hum, sam.factor='organism_part', con.factor='disease', pOA=c(0.01, 5), CV=c(0.3, 100), dir=NULL)

To inspect the results, the following returns three selected rows of the fully preprocessed data matrix (Table 1).

assay(se.fil.hum)[733:735, ]

Table 1: A slice of fully preprocessed data matrix.
cerebellum__ALS frontal_cortex__ALS cerebellum__normal frontal_cortex__normal
ENSG00000268433 5.324064 0.3419665 3.478074 0.1340332
ENSG00000268555 5.954572 2.6148548 4.934974 2.0351776
ENSG00000269113 7.544417 1.7425299 6.808402 0.9694065

The following shows how to download the corresponding pre-annotated aSVG image from the EBI SVG repository. The function return_feature queries the repository with feature and species keywords, i.e. c("frontal cortex", "cerebellum") and c("brain") respectively. The argument keywords.all is set FALSE so that aSVGs matching at least one word are returned. The argument return.all=FALSE means only aSVGs matching the keywords are returned and saved in dir. Otherwise, all aSVGs are returned regardless of the keywords. An empty directory is recommended so as to avoid overwriting existing SVG files with the same names. Here ~/test is used. remote=TRUE means the remote SVG repository is queried. If users want to query a local aSVG collection remote=FALSE should be used, and directory of the aSVG collection should be provided to dir. match.only is set TRUE so that only matching features are returned. If FALSE, all features in the matching aSVGs are returned.

[ThG-Comment 5: please add here the corresponding but unevaluated code to download the aSVG from your repos. Users want to know how this works.]

jianai: The functionality of using EBI online aSVG directly is in development. After done, there is no difference on the user interface. Thus it does not affect reviewing on this vignette.

# Make an empty directory.
dir.create('~/test')
# Query aSVGs.
feature.df <- return_feature(feature=c('frontal cortex', 'cerebellum'), species=c('brain'), keywords.all=FALSE, return.all=FALSE, dir='~/test', remote=TRUE, match.only=TRUE, desc=FALSE)
feature.df

As explained in the toy example, the same aSVG image has been included in this package. To meet the requirements for building vignettes in R packages, the following code section uses the packaged instance of the aSVG file (i.e. local aSVG) rather than downloading it. The ontology ids are available in the R package rols (Gatto 2019) or Ontology Lookup Service.

feature.df <- return_feature(feature=c('frontal cortex', 'cerebellum'), species=c('brain'), keywords.all=FALSE, return.all=FALSE, dir=svg.dir, remote=FALSE)
## Accessing features... 
## gallus_gallus.svg, homo_sapiens.brain.svg, mus_musculus.male.svg, organ_final.svg, root_cross_final.svg, root_roottip_final.svg, shoot_final.svg, shoot_root_final.svg, us_map_final.svg,

The two target tissues (frontal_cortex, cerebellum) in targets file are all included in the aSVG, so it can be used for plotting spatial heatmaps. If users want to change the feature identifiers in the aSVG refer to the Supplement for details.

feature.df
##                feature                   id
## 1       frontal_cortex       frontal_cortex
## 2 middle_frontal_gyrus middle_frontal_gyrus
## 3     cingulate_cortex     cingulate_cortex
## 4    prefrontal_cortex    prefrontal_cortex
## 5      cerebral_cortex      cerebral_cortex
## 6           cerebellum           cerebellum
##                      SVG    parent index index1
## 1 homo_sapiens.brain.svg container     5      5
## 2 homo_sapiens.brain.svg container    11     11
## 3 homo_sapiens.brain.svg container    24     24
## 4 homo_sapiens.brain.svg container    26     26
## 5 homo_sapiens.brain.svg container    27     27
## 6 homo_sapiens.brain.svg container    30     30

Next, the preprocessed expression values of gene ENSG00000268433 are plotted onto the corresponding features of the aSVG image depicting the human brain.

shm.df <- spatial_hm(svg.path=svg.hum, data=se.fil.hum, col.com=c('purple', 'yellow', 'blue'), ID=c('ENSG00000268433'), height=0.6, legend.r=1.3, sam.legend='identical')
Spatial heatmaps of human brain. Cerebellum and frontal cortex are colored, since they are the only 2 identical tissues between the aSVG image and the data.

Figure 3: Spatial heatmaps of human brain
Cerebellum and frontal cortex are colored, since they are the only 2 identical tissues between the aSVG image and the data.

The spatial heatmaps and mapped features are stored in a list and assigned to an object shm.df.

names(shm.df)
## [1] "spatial_heatmap" "mapped_feature"
# Mapped features
shm.df[['mapped_feature']]
##             rowID     featureSVG condition     value
## 1 ENSG00000268433     cerebellum       ALS 5.3240638
## 2 ENSG00000268433 frontal_cortex       ALS 0.3419665
## 3 ENSG00000268433     cerebellum    normal 3.4780744
## 4 ENSG00000268433 frontal_cortex    normal 0.1340332

In this example, the expression profile of gene ENSG00000268433 in frontal cortex and cerebellum is plotted under ALS and normal conditions (Figure 3) with the legend plot on the right. For example, in Table 1 its expression value in cerebellum under ALS is 5.324064. After mapping, this tissue is coloured blue corresponding to 5.324064 in the color scale. By contrast, this gene's expression profile is dark yellow in the same tissue under normal condition. On the other hand, the expression profile (purple) in frontal cortex is similar across ALS and normal. Therefore, it is intuitive that this gene's higher activity in cerebellum is potentially associated with ALS and could contribute to hypothesis generation.

Note that only frontal cortex and cerebellum are coloured while others are blank in the spatial heatmaps, since they are the only 2 matching tissues between the data and aSVG image. And the legend plot only shows these 2 tissues, because sam.legend=identical limits the legend to matching tissues between the data and aSVG. If sam.legend=all, all tissues are shown and legend plot would overlap with spatial heatmaps. Except for identical and all, sam.legend also accepts a vector of specific tissues for display.

[ThG-Comment 6: I suggest to include here a table where you summarize all arguments relevant for controlling the output of spatial_hm function. This will make it much easier for the user to look up relevant arguments and how to use them.] Added

In cases of multiple input genes, and/or multiple conditions, the subplots of spatial heatmaps might get squeezed. To achieve optimal appearance, the main function spatial_hm is designed to be as flexible as possible to avoid such issues. The flexibility is carried out by the arguments listed in Table 2.


Table 2: Description of arguments in “spatial_hm”.
argument description
svg.path Path of aSVG
data Input data of SummarizedExperiment (SE), data frame, or vector
sam.factor Applies to SE. Column name of sample replicates in colData slot. Default is NULL
con.factor Applies to SE. Column name of condition replicates in colData slot. Default is NULL
ID A character vector of row features for plotting spatial heatmaps
col.com A character vector of colour components for building colour scale. Default is c(‘yellow’, ‘purple’, ‘blue’)
col.bar ‘selected’ or ‘all’, the former means use values of ID to build the colour scale while the latter use all values in data. Default is ‘selected’.
bar.width A numeric of colour bar width. Default is 0.7
data.trans ‘log2’, ‘exp2’, or NULL, ‘log2’ transforms data to log2 scale for plotting while ‘exp2’ to 2-base exponent. Default is NULL, no transformation.
tis.trans A vector of aSVG features to be transparent. Default is NULL.
width, height Two numerics of width and height of spatial heatmap plots repsectively. Default is 1, 1.
legend.r The ratio aspect (width to height) of legend plot. Default is 1.
sub.title.size The title size of each spatial heatmap subplot. Default is 11.
lay.shm ‘gen’ or ‘con’, applies to multiple genes or conditions respectively. ‘gen’ means spatial heatmaps are organised by genes while ‘con’ organised by conditions.Default is ‘gen’
ncol The column number of spatial heatmaps, not including legend plot. Default is 2.
sam.legend ‘identical’, ‘all’, or a vector of samples/features in aSVG to show in legend plot. ‘identical’ only shows matching features while ‘all’ shows all features.
legend.ncol, legend.nrow Two numbers of columns and rows of legend keys respectively. Default is NULL, NULL, since they are automatically set.
legend.position the position of legend keys (‘none’, ‘left’, ‘right’,‘bottom’, ‘top’), or two-element numeric vector. Default is ‘bottom’.
legend.direction Layout of keys in legends (‘horizontal’ or ‘vertical’). Default is NULL, since it is automatically set.
legend.key.size, legend.label.size The size of legend keys and labels respectively. Default is 0.5 and 8 respectively.
line.size, line.color The size and colour of all plogyon outlines respectively. Default is 0.2 and ‘grey70’ respectively.
verbose TRUE or FALSE. Default is TRUE and the aSVG features not mapped are printed to R console.

[ThG-Comment 7: The paragraphs below have been moved here for now. There is a lot of duplication in this text that needs to be deleted (or hidden via comment tags) and/or merged into other sections where appropriate. Some of the text is certainly important but it often should be used to explain code sections rather than keeping it separate from the code. For instance the first paragraph below should go to a section where you show how to generate multiple target gene plots rather then just describing it in stand-alone text, especially since a summary of this particular example has been given in the intro section already.]

Moved

If mutiple target genes are input, a set of spatial heatmaps for each gene are plotted sequentially and organised on the same page. The lay.shm parameter specifies display these spatial heatmaps by genes or by conditions. This feature makes it flexible for users to compare expression profiles of the same gene across conditions or different genes across the same condition. For instance, Figure 4 is the spatial heatmaps of gene ENSG00000268433 and ENSG00000006047, which are organised by condition (horizontal view).

spatial_hm(svg.path=svg.hum, data=se.fil.hum, ID=c('ENSG00000268433', 'ENSG00000006047'), lay.shm='con', width=1, height=1, legend.r=1.5)
Spatial heatmaps of two genes. The subplots are organised by "condition" through `lay.shm` argument.

Figure 4: Spatial heatmaps of two genes
The subplots are organised by “condition” through lay.shm argument.

3.3 Mouse Organ

This example is based on a mouse data from an RNA-seq study aiming at assessing tissue-specific transcriptome variation across mammals (Merkin et al. 2012), which is from EBI Expression Atlas.

The following process is similar to the Human Brain example, so explanation of code chunks is simplified to avoid lengthy text.

Search Expression Atlas for expression data derived from ‘hear’ and ‘Mus musculus’.

all.mus <- searchAtlasExperiments(properties="heart", species="Mus musculus")

Among the results, select ‘E-MTAB-2801’.

all.mus[7, ]
## DataFrame with 1 row and 4 columns
##     Accession      Species                  Type
##   <character>  <character>           <character>
## 1 E-MTAB-2801 Mus musculus RNA-seq of coding RNA
##                                           Title
##                                     <character>
## 1 Strand-specific RNA-seq of nine mouse tissues
rse.mus <- getAtlasData('E-MTAB-2801')[[1]][[1]]

A slice of the experiment design, which is stored in colData slot.

colData(rse.mus)[1:3, ]
## DataFrame with 3 rows and 4 columns
##           AtlasAssayGroup     organism organism_part      strain
##               <character>  <character>   <character> <character>
## SRR594393              g7 Mus musculus         brain      DBA/2J
## SRR594394             g21 Mus musculus         colon      DBA/2J
## SRR594395             g13 Mus musculus         heart      DBA/2J

Make a targets file based on the experiment design. It is included in this package and part is shown below.

mus.tar <- system.file('extdata/shinyApp/example/target_mouse.txt', package='spatialHeatmap')
target.mus <- read.table(mus.tar, header=TRUE, row.names=1, sep='\t')
target.mus[1:3, ]
##           AtlasAssayGroup     organism organism_part strain
## SRR594393              g7 Mus musculus         brain DBA.2J
## SRR594394             g21 Mus musculus         colon DBA.2J
## SRR594395             g13 Mus musculus         heart DBA.2J
# Tissue. 
unique(target.mus[, 3])
## [1] "brain"           "colon"           "heart"          
## [4] "kidney"          "liver"           "lung"           
## [7] "skeletal.muscle" "spleen"          "testis"

Use the targets file to replace the data frame in colData slot.

colData(rse.mus) <- DataFrame(target.mus)

Pre-process the raw count matrix: normalise, aggregate, filter. Genes with expression values larger than 5 in at least 1% of all samples (pOA=c(0.01, 5)), and with coefficient of variance (CV) between 0.60 and 100 (CV=c(0.6, 100)) are retained.

# Normalise.
se.nor.mus <- norm_data(data=rse.mus, norm.fun='ESF', data.trans='log2')
## Normalising: ESF 
##    type 
## "ratio"
# Aggregate replicates.
se.aggr.mus <- aggr_rep(data=se.nor.mus, sam.factor='organism_part', con.factor='strain', aggr='mean')
# Filter genes with low variance and low counts.
se.fil.mus <- filter_data(data=se.aggr.mus, sam.factor='organism_part', con.factor='strain', pOA=c(0.01, 5), CV=c(0.6, 100), dir=NULL)

Download aSVGs from EBI repository directly. An empty directory ‘~/test’ is suggested to save the downloaded files so as to avoid overwriting existing files.

# Make an empty directory "~/test" if not exist.
if (!dir.exists('~/test')) dir.create('~/test')
# Query aSVGs.
feature.df <- return_feature(feature=c('heart', 'kidney'), species=c('Mus musculus'), keywords.all=FALSE, return.all=FALSE, dir='~/test', remote=TRUE, match.only=FALSE)

As explained in the toy example, the same aSVG image has been included in this package. To meet the requirements for building vignettes in R packages, the following code section uses the packaged instance of the aSVG files (i.e. local aSVG) rather than downloading it. To explain how to select a certain aSVG among the returned resutls, species is set NULL on purpose so that all aSVGs marching the feature keywords are returned.

feature.df <- return_feature(feature=c('heart', 'kidney'), species=NULL, keywords.all=FALSE, return.all=FALSE, dir=svg.dir, remote=FALSE, match.only=FALSE) 
## Accessing features... 
## gallus_gallus.svg, homo_sapiens.brain.svg, mus_musculus.male.svg, organ_final.svg, root_cross_final.svg, root_roottip_final.svg, shoot_final.svg, shoot_root_final.svg, us_map_final.svg,

All aSVG files matching the keywords.

unique(feature.df$SVG)
## [1] "gallus_gallus.svg"     "mus_musculus.male.svg"

Select ‘mus_musculus.male.svg’ as the target aSVG.

feature.df <- subset(feature.df, SVG=='mus_musculus.male.svg')
# A slice of the feature data frame.
feature.df[1:3, ]
##     feature       id                   SVG    parent index
## 10   kidney   kidney mus_musculus.male.svg container   257
## 11    heart    heart mus_musculus.male.svg container   261
## 12 outline1 outline1 mus_musculus.male.svg container     1
##    index1
## 10    257
## 11    261
## 12      1

Get the target aSVG path.

svg.mus <- system.file("extdata/shinyApp/example", "mus_musculus.male.svg", package="spatialHeatmap")

Plot spatial heatmaps.

Mouse organ spatial heatmap. This is a multiple-layer image and `skeletal.muscle` is set transparent to expose lung and heart.

Figure 5: Mouse organ spatial heatmap
This is a multiple-layer image and skeletal.muscle is set transparent to expose lung and heart.

The spatial heatmap of gene ENSMUSG00000000263 is plotted in 8 tissues across 3 strains. It is manifest that only brain exhibits obvious difference across the 3 strains with DBA.2J, C57BL.6, and CD1 being highest, medium, and lowest respectively. In contrast, all the other 8 tissues display similar profile across strains. Thus this gene is potentially strain-specific. Moreover, the expression levels of all the other 8 tissues are all lower than brain.

This is a typical example to demonstrate the usage of tis.trans parameter, since this mouse organ image includes tissues in multiple layers and skelectal muscle covers lung and heart. In Figure 5, skelectal muscle is set transparent through tis.trans=c('skeletal.muscle') so that lung and heart are exposed. By contrast, in Figure 6 tis.trans=NULL exposes skeletal.muscle and lung and heart are covered.

Moreover, presence of too many tissues might affect the visual effects due to the messy polygon outlines. The line.size and line.color parameters are used to adjust the thickness and color of polygon outlines respectively and thus enhance the visualisation. In Figure 2, the default values of the 2 arguments are used.

Mouse organ spatial heatmap. This is a multiple-layer image and `skeletal.muscle` convers lung and heart.

Figure 6: Mouse organ spatial heatmap
This is a multiple-layer image and skeletal.muscle convers lung and heart.

3.4 Chicken Organ

In this example, the data come from developments of 7 chicken organs under 9 time points (Cardoso-Moreira et al. 2019), which is an RNA-seq analysis and accessed from EBI Expression Atlas.

The following process is similar to the Human Brain example, so explanation of code chunks is simplified to avoid lengthy text.

Search Expression Atlas for expression data derived from ‘hear’ and ‘gallus’.

all.chk <- searchAtlasExperiments(properties="heart", species="gallus")

Among the results, select ‘E-MTAB-6769’.

all.chk[3, ]
## DataFrame with 1 row and 4 columns
##     Accession       Species                  Type
##   <character>   <character>           <character>
## 1 E-MTAB-6769 Gallus gallus RNA-seq of coding RNA
##                                                                  Title
##                                                            <character>
## 1 Chicken RNA-seq time-series of the development of seven major organs
rse.chk <- getAtlasData('E-MTAB-6769')[[1]][[1]]

A slice of the experiment design, which is stored in colData slot.

colData(rse.chk)[1:3, ]
## DataFrame with 3 rows and 8 columns
##            AtlasAssayGroup      organism         strain
##                <character>   <character>    <character>
## ERR2576379              g1 Gallus gallus Red Junglefowl
## ERR2576380              g1 Gallus gallus Red Junglefowl
## ERR2576381              g2 Gallus gallus Red Junglefowl
##                      genotype developmental_stage         age
##                   <character>         <character> <character>
## ERR2576379 wild type genotype              embryo      10 day
## ERR2576380 wild type genotype              embryo      10 day
## ERR2576381 wild type genotype              embryo      10 day
##                    sex organism_part
##            <character>   <character>
## ERR2576379      female         brain
## ERR2576380      female         brain
## ERR2576381      female    cerebellum

Make a targets file based on the experiment design. It is included in this package and part is shown below.

chk.tar <- system.file('extdata/shinyApp/example/target_chicken.txt', package='spatialHeatmap')
target.chk <- read.table(chk.tar, header=TRUE, row.names=1, sep='\t')
target.chk[1:3, ]
##            AtlasAssayGroup      organism         strain
## ERR2576379              g1 Gallus gallus Red Junglefowl
## ERR2576380              g1 Gallus gallus Red Junglefowl
## ERR2576381              g2 Gallus gallus Red Junglefowl
##                      genotype developmental_stage   age    sex
## ERR2576379 wild type genotype              embryo day10 female
## ERR2576380 wild type genotype              embryo day10 female
## ERR2576381 wild type genotype              embryo day10 female
##            organism_part
## ERR2576379         brain
## ERR2576380         brain
## ERR2576381    cerebellum

Use the targets file to replace the data frame in colData slot.

colData(rse.chk) <- DataFrame(target.chk)

All samples used for plotting spatial heatmaps.

unique(colData(rse.chk)[, 'organism_part'])
## [1] "brain"      "cerebellum" "heart"      "kidney"    
## [5] "ovary"      "testis"     "liver"

All conditions used for plotting spatial heatmaps.

unique(colData(rse.chk)[, 'age'])
## [1] "day10"  "day12"  "day14"  "day17"  "day0"   "day155"
## [7] "day35"  "day7"   "day70"

Pro-process data matrix: normalise, aggregate, filter. Genes with expression values larger than 5 in at least 1% of all samples (pOA=c(0.01, 5)), and with coefficient of variance (CV) between 0.6 and 100 (CV=c(0.6, 100)) are retained.

# Normalise.
se.nor.chk <- norm_data(data=rse.chk, norm.fun='ESF', data.trans='log2')
## Normalising: ESF 
##    type 
## "ratio"
# Aggregate replicates. 
se.aggr.chk <- aggr_rep(data=se.nor.chk, sam.factor='organism_part', con.factor='age', aggr='mean')
# Filter genes with low variance and low counts.
se.fil.chk <- filter_data(data=se.aggr.chk, sam.factor='organism_part', con.factor='age', pOA=c(0.01, 5), CV=c(0.6, 100), dir=NULL)

Download aSVGs from EBI repository directly. An empty directory ‘~/test’ is suggested to save the downloaded files so as to avoid overwriting existing files.

# Make an empty directory "~/test" if not exist.
if (!dir.exists('~/test')) dir.create('~/test')
# Query aSVGs.
feature.df <- return_feature(feature=c('heart', 'kidney'), species=c('gallus'), keywords.all=FALSE, return.all=FALSE, dir='~/test', remote=TRUE, match.only=FALSE)

As explained in the toy example, the target aSVG image has been included in this package, and will be used for plotting spatial heatmaps.

feature.df <- return_feature(feature=c('heart', 'kidney'), species=c('gallus'), keywords.all=FALSE, return.all=FALSE, dir=svg.dir, remote=FALSE, match.only=FALSE)
## Accessing features... 
## gallus_gallus.svg, homo_sapiens.brain.svg, mus_musculus.male.svg, organ_final.svg, root_cross_final.svg, root_roottip_final.svg, shoot_final.svg, shoot_root_final.svg, us_map_final.svg,
feature.df
##                 feature                    id               SVG
## 1                 heart                 heart gallus_gallus.svg
## 2                kidney                kidney gallus_gallus.svg
## 3       chicken_outline       chicken_outline gallus_gallus.svg
## 4                 brain                 brain gallus_gallus.svg
## 5                 liver                 liver gallus_gallus.svg
## 6 skeletal.muscle.organ skeletal.muscle.organ gallus_gallus.svg
## 7                 colon                 colon gallus_gallus.svg
## 8                spleen                spleen gallus_gallus.svg
## 9                  lung                  lung gallus_gallus.svg
##      parent index index1
## 1 container     3      3
## 2 container     4      4
## 3 container     1      1
## 4 container     2      2
## 5 container     5      5
## 6 container     6      6
## 7 container     7      7
## 8 container     8      8
## 9 container     9      9

Get the target aSVG path.

svg.chk <- system.file("extdata/shinyApp/example", "gallus_gallus.svg", package="spatialHeatmap")

Plot spatial heatmaps.

## Enrties not mapped: cerebellum, ovary, testis
Example of plotting chicken organ spatial heatmaps. Liver in day10 is not plotted since this tissue in day10 in not available in the data matrix.

Figure 7: Example of plotting chicken organ spatial heatmaps
Liver in day10 is not plotted since this tissue in day10 in not available in the data matrix.

The spatial heatmap of gene ENSGALG00000006346 is plotted. It is intuitive that the profiles of liver, heart, and kidney are all higher in day17 than other days. Therefore, the important role of this gene in day10 is worth futher exploration. In day10 liver is blank, because in the expression matrix liver data is not availble for day10. This reflects the plotting algorithm that only matching samples between the data and SVG image are plotted.

In this example, the usage of argument ncol is exhibited on how to achieve optimal layout. There are 9 time conditions, so ncol=3 is set to make make full use of the space.

3.5 Arabidopsis Shoot

GEO is a another well-known public repository of array- and sequence-based data. To demonstrate the use of spatialHeatmap on this resource, the dataset GSE14502 is plotted on a shoot aSVG image. It a microarray data from a study of translatome variation of Arabidopsis thaliana (Arabidopsis) shoot and root tissues under control and hypoxia conditions (Mustroph et al. 2009), and is downloaded through GEOquery (S. Davis and Meltzer 2007).

Access the GEO dataset GSE14502 and convert it to SummarizedExperiment.

gset <- getGEO("GSE14502", GSEMatrix=TRUE, getGPL=TRUE)[[1]]
se.sh <- as(gset, "SummarizedExperiment")

Use gene symbols to replace probes.

rownames(se.sh) <- make.names(rowData(se.sh)[, 'Gene.Symbol'])

A slice of the experiment design, which is stored in colData slot.

colData(se.sh)[1:3, 1:4]
## DataFrame with 3 rows and 4 columns
##                             title geo_accession
##                       <character>   <character>
## GSM362168 root_control_total_rep1     GSM362168
## GSM362169 root_control_total_rep2     GSM362169
## GSM362170 root_control_total_rep3     GSM362170
##                          status submission_date
##                     <character>     <character>
## GSM362168 Public on Oct 12 2009     Jan 21 2009
## GSM362169 Public on Oct 12 2009     Jan 21 2009
## GSM362170 Public on Oct 12 2009     Jan 21 2009

Make a targets file based on the title column in experiment design. It is included in this package and part is shown below.

sh.tar <- system.file('extdata/shinyApp/example/target_arab.txt', package='spatialHeatmap')
target.sh <- read.table(sh.tar, header=TRUE, row.names=1, sep='\t')
target.sh[60:63, ]
##                           col.name     samples conditions
## shoot_hypoxia_pGL2_rep1  GSM362227  shoot_pGL2    hypoxia
## shoot_hypoxia_pGL2_rep2  GSM362228  shoot_pGL2    hypoxia
## shoot_control_pRBCS_rep1 GSM362229 shoot_pRBCS    control
## shoot_control_pRBCS_rep2 GSM362230 shoot_pRBCS    control

All samples in targets file. Promoter pGL2, pCO2, pSCR, pWOL labels root atrichoblast epidermis, root cortex meristematic zone, root endodermis, root vasculature respectively.

unique(target.sh[, 'samples'])
##  [1] "root_total"      "root_p35S"       "root_pSCR"      
##  [4] "root_pSHR"       "root_pWOL"       "root_pGL2"      
##  [7] "root_pSUC2"      "root_pSultr2.2"  "root_pCO2"      
## [10] "root_pPEP"       "root_pRPL11C"    "shoot_total"    
## [13] "shoot_p35S"      "shoot_pGL2"      "shoot_pRBCS"    
## [16] "shoot_pSUC2"     "shoot_pSultr2.2" "shoot_pCER5"    
## [19] "shoot_pKAT1"

All conditions in targets file.

unique(target.sh[, 'conditions'])
## [1] "control" "hypoxia"

Use the targets file to replace the data frame in colData slot.

colData(se.sh) <- DataFrame(target.sh)

The dataset GSE14502 is already normalised by RMA (Gautier et al. 2004), so the pro-processing only includes aggregation and filtering. Genes with expression values larger than 6 in at least 3% of all samples (pOA=c(0.03, 6)), and with coefficient of variance (CV) between 0.30 and 100 (CV=c(0.30, 100)) are retained.

# Aggregate replicates. 
se.aggr.sh <- aggr_rep(data=se.sh, sam.factor='samples', con.factor='conditions', aggr='mean')
# Filter genes with low variance and low intensity.
se.fil.sh <- filter_data(data=se.aggr.sh, sam.factor='samples', con.factor='conditions', pOA=c(0.03, 6), CV=c(0.30, 100), dir=NULL)

The aSVG image is made from Mustroph et al. (2009) and included in this packaged. Thus it can be queried locally.

feature.df <- return_feature(feature=c('shoot_pGL2', 'shoot_pRBCS'), species=c('shoot'), keywords.all=FALSE, return.all=FALSE, dir=svg.dir, remote=FALSE, match.only=FALSE)
## Accessing features... 
## gallus_gallus.svg, homo_sapiens.brain.svg, mus_musculus.male.svg, organ_final.svg, root_cross_final.svg, root_roottip_final.svg, shoot_final.svg, shoot_root_final.svg, us_map_final.svg,

All matching aSVGs.

unique(feature.df$SVG)
## [1] "shoot_final.svg"      "shoot_root_final.svg"

Select ‘shoot_final.svg’ for plotting spaital heatmaps.

feature.df <- subset(feature.df, SVG=='shoot_final.svg')
# A slice of the feature data frame.
feature.df[1:3, ]
##       feature          id             SVG parent index index1
## 1  shoot_pGL2  shoot_pGL2 shoot_final.svg  large     2      2
## 2 shoot_pRBCS shoot_pRBCS shoot_final.svg  large     3      3
## 3 shoot_pCER5 shoot_pCER5 shoot_final.svg  large     4      4

Get path of ‘shoot_final.svg’.

svg.sh <- system.file("extdata/shinyApp/example", "shoot_final.svg", package="spatialHeatmap")

Plot spatial heatmaps.

spatial_hm(svg.path=svg.sh, data=se.fil.sh, ID=c("HRE2"), height=0.6, legend.nrow=3, legend.r=1.3, legend.key.size=0.3)
## Enrties not mapped: root_total, root_p35S, root_pSCR, root_pSHR, root_pWOL, root_pGL2, root_pSUC2, root_pSultr2.2, root_pCO2, root_pPEP, root_pRPL11C, shoot_total, shoot_p35S
Spatial heatmaps of Arabidopsis shoot. Pre-defined tissue regions are colored by the expression profile of the target gene. The promoter pGL2, pRBCS, pCER5, pSultr2.2, pSUC2, pKAT1 label shoot trichomes, shoot photosynthetic cell, cotyledon and leaf epidermis, shootbundle sheath, shoot phloem companion cells, Cotyledon and leaf guard cells, respectively.

Figure 8: Spatial heatmaps of Arabidopsis shoot
Pre-defined tissue regions are colored by the expression profile of the target gene. The promoter pGL2, pRBCS, pCER5, pSultr2.2, pSUC2, pKAT1 label shoot trichomes, shoot photosynthetic cell, cotyledon and leaf epidermis, shootbundle sheath, shoot phloem companion cells, Cotyledon and leaf guard cells, respectively.

Figure 8 is the spatial heatmap of gene ‘HRE2’ under control and hypoxia. It is clear that this gene's exression profiles under control are lower than hypoxia across all the 5 tissues (pGL2, pRBCS, pCER5, pSUC2, pKAT1). Therefore, it can be hypothesised that hypoxia induces over-expression of ‘HRE2’ across the 5 tissues and thus ‘HRE2’ might be an important factor for Arabidopsis shoot to cope with hypoxia stress. The tissue pSultr2.2 is blank under hypoxia due to unavailability of its data under hypoxia in the data matrix.

jianhai: From here to Shiny App section, slight changes are made.

4 Matrix Heatmaps

The Matrix Heatmap is designed to supplement the core feature of spatial heatmap. It displays the target gene in the context of corresponding gene network module, so there is a process of gene modules identification.

Adjacency Matrix and Module Identification

The modules are identified by adj_mod. It first computes an adjacency matrix on the gene expression matrix then hierarchically clusters the adjacency matrix by using WGCNA (Langfelder and Horvath 2008) and flashClust (Langfelder and Horvath 2012). The clutersing includes 4 alternative sensitivity levels (ds=0, 1, 2, or 3). From 3 to 0, the sensitivity decreases and results in less modules with larger sizes. Since the interactive network functionality performs better on smaller modules, only ds of 3 and 2 are used. There is another parameter type for module identification: signed and unsinged. The signed means both positive and negative adjacency between genes are used while the unsigned takes the absolute values of negative adjacency.

The function adj_mod returns a list containing an adjacency matrix and a data frame of module assignment. It is domenstrated on the Arabidopsis Shoot data.

adj.mod <- adj_mod(data=se.fil.sh, type="signed", minSize=15, dir=NULL)

The adjacency matrix is a measure of co-expression similarity between genes, where larger value denotes more similarity.

adj.mod[['adj']][1:3, 1:3]
##           ndhA      petL      psaJ
## ndhA 1.0000000 0.5374043 0.6088355
## petL 0.5374043 1.0000000 0.7779227
## psaJ 0.6088355 0.7779227 1.0000000

The module assignment is a data frame. The first column is ds=2 while the second is ds=3. The numbers in each column are module labels with “0” meaning genes not assigned to any modules.

adj.mod[['mod']][1:3, ]
##      2 3
## ndhA 1 0
## petL 1 0
## psaJ 1 0

The matrix heatmap is implemented in function matrix_hm with 2 modes provided: static or interactive. Figure 9 is the static mode on gene ‘HRE2’. Setting static=FALSE launches the interactive mode, where users can zoom in and out by drawing a rectangle and by double clicking the heatmap, respectively.

matrix_hm(geneID="HRE2", data=se.fil.sh, adj.mod=adj.mod, ds="3", scale="no", angleCol=80, angleRow=35, cexRow=0.8, cexCol=0.8, margin=c(10, 6), static=TRUE, arg.lis1=list(offsetRow=0.1, offsetCol=0.1))
Matrix Heatmap. Rows are genes and columns are samples. The input gene is tagged by 2 black lines.

Figure 9: Matrix Heatmap
Rows are genes and columns are samples. The input gene is tagged by 2 black lines.

In Figure 9, the target gene is displayed in the gene module it belongs to, which is indicated by 2 black lines. The rows and columns are sorted by hierarchical clustering dendrograms. The expression matrix of this module is visualised without being scaled (scale="no"). It can be seen that the expression levels of this module is overall much higher in hypoxia than control, and therefore it could potentially be used to infer the hypoxia response mechanism in Arabidopsis.

5 Network Graphs

The same target gene and module from matrix heatmap can also be displayed as a network. Similarly, the network can be dispayed in static or interactive mode.

Setting static=TRUE launches the static network. In Figure 10 Nodes are genes and edges are adjacencies between genes. The thicker edge denotes higher adjacency (co-expression similarity) while larger node indicates higher gene connectivity (sum of a gene's adjacency with all its direct neighbours). The target gene is labeled by ’_selected’.

network(geneID="HRE2", data=se.fil.sh, ann=NULL, adj.mod=adj.mod, ds="3", adj.min=0.75, con.min=0, vertex.label.cex=1.2, vertex.cex=2, static=TRUE)
Static network. Node size denotes gene connectivity while edge thickness stands for co-expression similarity.

Figure 10: Static network
Node size denotes gene connectivity while edge thickness stands for co-expression similarity.

Setting static=FALSE launches the interactive network. There is an interactive color bar to denote gene connectivity. The color ingredients must only be separated by comma, e.g. ‘yellow,black,purple’, which means gene connectivity increases from yellow to purple. If too many edges (e.g.: > 300) are displayed, the network could get stuck. So the ‘Input an adjacency threshold to display the adjacency network.’ option sets a threthold to filter out weak edges. If not too many edges retained (e.g.: < 300), users can check ‘Yes’ under ‘Display or not?’, then the network would be responsive smoothly. To maintain acceptable performance, users are advised to choose a stringent threshold (e.g. 0.9) initially, then decrease the value gradually. The interactive feature allows users to zoom in and out, or drag a gene around. All the gene IDs in the network module are listed in ‘Select by id’ in decreasing order according to gene connectivity. Same with static mode, the target gene ID is appended ’_selected’.

If gene annotation is available in rowData slot and provided to ann argument, the annotation is seen by mousing over a node. In this example, Target.Description in rowData is provided to ann.

network(geneID="HRE2", data=se.fil.sh, ann='Target.Description', adj.mod=adj.mod, adj.min=0.75, con.min=0, vertex.label.cex=1.2, vertex.cex=2, static=FALSE)

6 Shiny App

All the above functionality (spatial heatmap, interactive matrix heatmap, interactive network) is also combined into a web-browser based Shiny App, which takes advantage of the computational power of R and interactivity of the web. The main benefits of the Shiny App is combine all the utities in one interface and increase interactivity. On the left of this app is the menu. It includes pre-formatted ready-to-use examples, options to upload formatted data matrix and aSVG images, and instruction to use this app. On the right is the interactive interfacce, including Data Matrix, Spatial Heatmap, Matrix Heatmap, and Network. To use interactive features, there are paramters on the left menu to operate. Upon launched, the app automatically displays a pre-formatted example. A good practice to use this app is to follow steps in the menu rather than skipping steps. If unexpectation happens, the app webpage should be refreshed.

This app is launched by the function shiny_all without any parameters. Figure 11 is the screenshot of Spatial Heatmap.

shiny_all()
The snapshot of Shiny App. Left is the menu and right is the Spatial Heatmap.

Figure 11: The snapshot of Shiny App
Left is the menu and right is the Spatial Heatmap.

The data matrix to upload is a data frame and can be obtained by setting a directory path to dir in function filter_data. A directory ‘local_mode_result/’ is automatically created in the provided path, and the filtered data matrix is written to ‘local_mode_result/processed_data.txt’ with column names in the syntax 1">’sample__condition’, which is a tab-separated file. If users want to see annotation by mousing over a node in the network, then a column of gene annotation in rowData slot should be provided to ann, and the annotation is appended to the last column in ‘processed_data.txt’.

For example, in filter_data, setting dir='./' (current working directory) will output the filtered data matrix in ‘./local_mode_result/processed_data.txt’, and setting ann='Target.Description' appends the annotation from rowData slot to the last column of ‘processed_data.txt’, which is ready to upload to the app.

se.fil.sh <- filter_data(data=se.aggr.sh, ann="Target.Description", sam.factor='samples', con.factor='conditions', pOA=c(0.03, 6), CV=c(0.30, 100), dir='./')

jianhai: the following Supplement section is substantially changed.

7 Supplement

To plot spatial heatmaps, a pair of data (vector, data frame, SummarizedExperiment) and aSVG image are required. The most important step is to format the data and aSVG image so that target features in aSVG have matching counterparts in the data, since only the matching features in aSVG images are coloured. This section explains details of data and aSVG setup that are not covered in the main vignette.

7.1 Format the Data

The accepted data classes include vector, data frame, or SummarizedExperiment (SE). Vector applies to several numeric values measured from a single sample, and data frame applies to several samples and/or several conditions (e.g. 2 samples under 2 conditions) regardless of row features. By contrast, SE applies to experiments with many samples and many conditions. Formatting the data is essentially define samples and/or conditions.

Vector
In the case of vector, the numeric values are measured from different samples. If one or more conditions are provided, the samples and conditions should be connected by double undescore, i.e. in the form of ’sample__condition’. If no conditions are provided, all the samples are assumed to have same condition, which is the toy example.

Take 2 samples ‘occipital_lobe’ and ‘parietal_lobe’ from the toy example for instance and assume there are 2 conditions, ‘condition1’ and ‘condition2’. Select 5 random values, assign 4 of them to the 2 samples under the 2 conditions, and the last one to a not-mapped sample. Note the value names should be unique.

# Random numeric values.
vec <- sample(x=1:100, size=5)
# Give unique names to random values.
names(vec) <- c('occipital_lobe__condition1', 'occipital_lobe__condition2', 'parietal_lobe__condition1', 'parietal_lobe__condition2', 'notMapped')
vec
## occipital_lobe__condition1 occipital_lobe__condition2 
##                          2                         68 
##  parietal_lobe__condition1  parietal_lobe__condition2 
##                         14                         30 
##                  notMapped 
##                         32

Plot spatial heatmaps.

spatial_hm(svg.path=svg.hum, data=vec, ID='toy', ncol=1, legend.r=1.2, sub.title.size=14)
## Enrties not mapped: notMapped
Spatial heatmaps on a vector. 'occipital_lobe' and 'parietal_lobe' are 2 aSVG features and 'condition1' and 'condition2' are conditions.

Figure 12: Spatial heatmaps on a vector
‘occipital_lobe’ and ‘parietal_lobe’ are 2 aSVG features and ‘condition1’ and ‘condition2’ are conditions.

Data Frame
In the case of data frame, numeric values are measured from different samples. Similarly, if one or more conditions are provided, the column names should be in the form of ’sample__condition’. If no conditions are provided, all the samples are assumed to have same condition.

Take the same samples and conditions in the vector case as example.

Make a numeric data frame of 20 rows and 5 columns. Name columns with the value names (each is unique) from above vector and rows with 20 genes (gene1, gene2, …, gene20).

# Make a numeric data frame.
df.test <- data.frame(matrix(sample(x=1:1000, size=100), nrow=20))
# Name the columns.
colnames(df.test) <- names(vec)
# Name the rows.
rownames(df.test) <- paste0('gene', 1:20)
# A slice of the data frame.
df.test[1:3, ]
##       occipital_lobe__condition1 occipital_lobe__condition2
## gene1                        370                        330
## gene2                        559                        130
## gene3                        477                        453
##       parietal_lobe__condition1 parietal_lobe__condition2
## gene1                       867                       155
## gene2                       277                       181
## gene3                       802                       696
##       notMapped
## gene1       470
## gene2       892
## gene3       784

In the downstream interactive network, if users want to have a gene annotation by mousing over a node, a column of gene annotation can be appended to the data frame. For example, the 20 genes are annotated as ann1, ann2, …, ann20.

df.test$ann <- paste0('ann', 1:20)
df.test[1:3, ]
##       occipital_lobe__condition1 occipital_lobe__condition2
## gene1                        370                        330
## gene2                        559                        130
## gene3                        477                        453
##       parietal_lobe__condition1 parietal_lobe__condition2
## gene1                       867                       155
## gene2                       277                       181
## gene3                       802                       696
##       notMapped  ann
## gene1       470 ann1
## gene2       892 ann2
## gene3       784 ann3

Plot spatial heatmaps on ‘gene1’.

spatial_hm(svg.path=svg.hum, data=df.test, ID=c('gene1'), ncol=1, legend.r=1.2, sub.title.size=14)
## Enrties not mapped: notMapped
Spatial heatmaps on a data frame. 'occipital_lobe' and 'parietal_lobe' are 2 aSVG features and 'condition1' and 'condition2' are conditions.

Figure 13: Spatial heatmaps on a data frame
‘occipital_lobe’ and ‘parietal_lobe’ are 2 aSVG features and ‘condition1’ and ‘condition2’ are conditions.

SummarizedExperiment

In the following, the same samples and conditions in the above data frame are taken as example.

Formatting data of SummarizedExperiment (SE, Morgan et al. (2018)) is essentially to make a targets file (a data frame of column metadata). The targets file usually has at least 2 columns that specifies sample and condition replicates respectively, and should be added to the colData slot. The data matrix should have genes and sample/conditions in rows and columns respectively, and must be in the assay slot. The rowData slot can store a data frame of annotaions corresponding to rows in assay slot, but is not required.

To plot spaital heatmap successfully, the targets file should meet the following requirements.

  1. It is a data frame and usually has at least one column of samples and one column of conditions. The rows correspond with columns in assay slot. If the condition column is not defined, the samples are assumped under same condition.

  2. The sample column specifies sample replicates. It is crucial that replicate names of the same sample must be identical. Otherwise, they are treated as different samples. E.g. ‘occipital_lobe’, ‘occipital_lobe’ are the same sample while ‘occipital_lobe1’, ‘occipital_lobe2’ are different samples.

  3. The sample identifiers of interest must be identical with features of interest in aSVG respectively. It means even a dot, undescore, space, etc can make a difference and lead to target features not coloured in spatial heatmaps. Since double underscore (__) is a reserved separator in spatialHeatmap, it cannot be used in sample or condition identifiers.

  4. The condition column has the same requirement with the sample column. E.g. ‘condition1’, ‘condition1’ is same conditoin while ‘condition1A’, ‘condition1B’ is treated as different conditions.

In the following example, ‘occipital_lobe’ has 2 conditions ‘condition1’ and ‘condition2’, and each condition has 2 replicates, so there are 4 assays for ‘occipital_lobe’. The same applies to ‘parietal_lobe’. Based on this experiment design, the corresponding targets file is made, where a row is an assay.

# Two samples.
sample <- c(rep('occipital_lobe', 4), rep('parietal_lobe', 4))
# Two conditions.
condition <- rep(c('condition1', 'condition1', 'condition2', 'condition2'), 2)
# Targets file.
target.test <- data.frame(sample=sample, condition=condition, row.names=paste0('assay', 1:8))
target.test
##                sample  condition
## assay1 occipital_lobe condition1
## assay2 occipital_lobe condition1
## assay3 occipital_lobe condition2
## assay4 occipital_lobe condition2
## assay5  parietal_lobe condition1
## assay6  parietal_lobe condition1
## assay7  parietal_lobe condition2
## assay8  parietal_lobe condition2

Make a random numeric data frame of 8 columns and 20 rows. Each column is an assay and each row is a gene's expression profile. Columns must correspond with rows in targets file, so column names are assigned assay1-8.

# Make a numeric data frame.
df.se <- data.frame(matrix(sample(x=1:1000, size=160), nrow=20))
# Name the rows.
rownames(df.se) <- paste0('gene', 1:20)
# Replace the default column names. 
colnames(df.se) <- row.names(target.test)
# A slice of the data frame.
df.se[1:3, ]
##       assay1 assay2 assay3 assay4 assay5 assay6 assay7 assay8
## gene1    131    312    700    275    663    909    950    458
## gene2    199    712    614    831    149    976    183    386
## gene3    315    209    326    245    146    419    279    817
se <- SummarizedExperiment(assays=df.se, colData=target.test)
se
## class: SummarizedExperiment 
## dim: 20 8 
## metadata(0):
## assays(1): ''
## rownames(20): gene1 gene2 ... gene19 gene20
## rowData names(0):
## colnames(8): assay1 assay2 ... assay7 assay8
## colData names(2): sample condition

Similarly, in the downstream interactive network, if users want to have a gene annotation by mousing over a node, a data frame of gene annotation can be added to rowData slot, i.e. the ann column in df.test.

rowData(se) <- df.test['ann']

In this simple example, the normalization and filtering process is left out, but replicates should be aggregated. In function aggr_rep, the sample and condition columns in targets file are concatenated with double underscore to form ’sample__condition’ replicates for aggregating.

se.aggr <- aggr_rep(data=se, sam.factor='sample', con.factor='condition', aggr='mean')
assay(se.aggr)[1:3, ]
##       occipital_lobe__condition1 occipital_lobe__condition2
## gene1                      221.5                      487.5
## gene2                      455.5                      722.5
## gene3                      262.0                      285.5
##       parietal_lobe__condition1 parietal_lobe__condition2
## gene1                     786.0                     704.0
## gene2                     562.5                     284.5
## gene3                     282.5                     548.0

Plot spatial heatmaps on ‘gene1’.

spatial_hm(svg.path=svg.hum, data=se.aggr, ID=c('gene1'), ncol=1, legend.r=1.2, sub.title.size=14)
Spatial heatmaps on a SummarizedExperiment. 'occipital_lobe' and 'parietal_lobe' are 2 aSVG features and 'condition1' and 'condition2' are conditions.

Figure 14: Spatial heatmaps on a SummarizedExperiment
‘occipital_lobe’ and ‘parietal_lobe’ are 2 aSVG features and ‘condition1’ and ‘condition2’ are conditions.

7.2 aSVG repository

The aSVG repository is from EBI Gene Expression Group, where the requirements on aSVG format are included. It contains aSVGs across different species and can be downloaded with funtion return_feature directly. If users cannot find a target aSVG in this repository, there is a step-by-step SVG tutorial to create custom aSVG images, which is developed by this project.

jianhai: This tutorial is made based on previous SVG format. It is being updated to be compatible with EBI SVG format.

7.3 Update aSVG features

To change existing feature identifiers in aSVG, the function update_feature should be used. For testing purpose, an empty folder ‘~/test1’ is created and a copy of the aSVG ‘homo_sapiens.brain.svg’ packaged in spatialHeatmap is saved in there.

# Make an empty directory.
dir.create('~/test1')
# Copy the "homo_sapiens.brain.svg" aSVG.
svg.hum <- system.file("extdata/shinyApp/example", 'homo_sapiens.brain.svg', package="spatialHeatmap")
file.copy(from=svg.hum, to='~/test1', overwrite=FALSE)
## [1] FALSE

Use feature and species keywords (‘frontal cortex’ and ‘homo sapiens’) to query the aSVG and return existing features, which is a data frame.

feature.df <- return_feature(feature=c('frontal cortex'), species=c('homo sapiens'), dir='~/test1', remote=FALSE)
## Accessing features... 
## homo_sapiens.brain.svg,
feature.df
##             feature                id                    SVG
## 1    frontal.cortex    frontal_cortex homo_sapiens.brain.svg
## 2 prefrontal.cortex prefrontal_cortex homo_sapiens.brain.svg
##      parent index index1
## 1 container     5      5
## 2 container    26     26

Make a vector of new feature identifiers corresponding to every returned feature, e.g. replacing undescores with dots. This vector must be added to the first column of the feature data frame, since that is where update_feature looks for new features. Then features are updated with update_feature.

# A vector of new features.
f.new <- c('frontal.cortex', 'prefrontal.cortex')

# New features added to the first column of feature data frame.
feature.df.new <- cbind(featureNew=f.new, feature.df)
feature.df.new
##          featureNew           feature                id
## 1    frontal.cortex    frontal.cortex    frontal_cortex
## 2 prefrontal.cortex prefrontal.cortex prefrontal_cortex
##                      SVG    parent index index1
## 1 homo_sapiens.brain.svg container     5      5
## 2 homo_sapiens.brain.svg container    26     26
# Update the features.
update_feature(feature=feature.df.new, dir='~/test1')
## Setting titles 'frontal.cortex, prefrontal.cortex' in ~/test1/homo_sapiens.brain.svg


# Version Informaion
sessionInfo()
## R version 3.6.2 (2019-12-12)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.5 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/libblas/libblas.so.3.6.0
## LAPACK: /usr/lib/lapack/liblapack.so.3.6.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] parallel  stats4    stats     graphics  grDevices utils    
## [7] datasets  methods   base     
## 
## other attached packages:
##  [1] GEOquery_2.54.1             ExpressionAtlas_1.14.0     
##  [3] xml2_1.2.2                  limma_3.42.2               
##  [5] SummarizedExperiment_1.16.1 DelayedArray_0.12.2        
##  [7] BiocParallel_1.20.1         matrixStats_0.55.0         
##  [9] Biobase_2.46.0              GenomicRanges_1.38.0       
## [11] GenomeInfoDb_1.22.0         IRanges_2.20.2             
## [13] spatialHeatmap_0.99.0       knitr_1.28                 
## [15] S4Vectors_0.24.3            BiocGenerics_0.32.0        
## [17] BiocStyle_2.14.4            nvimcom_0.9-25             
## 
## loaded via a namespace (and not attached):
##   [1] backports_1.1.5        rols_2.14.0           
##   [3] Hmisc_4.3-1            igraph_1.2.4.2        
##   [5] lazyeval_0.2.2         shinydashboard_0.7.1  
##   [7] splines_3.6.2          ggplot2_3.2.1         
##   [9] robust_0.4-18.2        digest_0.6.25         
##  [11] foreach_1.4.8          htmltools_0.4.0       
##  [13] magick_2.3             GO.db_3.10.0          
##  [15] gdata_2.18.0           magrittr_1.5          
##  [17] checkmate_2.0.0        memoise_1.1.0         
##  [19] fit.models_0.5-14      cluster_2.1.0         
##  [21] doParallel_1.0.15      readr_1.3.1           
##  [23] fastcluster_1.1.25     annotate_1.64.0       
##  [25] prettyunits_1.1.1      jpeg_0.1-8.1          
##  [27] colorspace_1.4-1       blob_1.2.1            
##  [29] rrcov_1.5-2            xfun_0.12             
##  [31] dplyr_0.8.4            crayon_1.3.4          
##  [33] RCurl_1.98-1.1         jsonlite_1.6.1        
##  [35] genefilter_1.68.0      impute_1.60.0         
##  [37] survival_3.1-8         iterators_1.0.12      
##  [39] glue_1.4.0             gtable_0.3.0          
##  [41] zlibbioc_1.32.0        XVector_0.26.0        
##  [43] DEoptimR_1.0-8         scales_1.1.0          
##  [45] mvtnorm_1.0-12         DBI_1.1.0             
##  [47] edgeR_3.28.0           Rcpp_1.0.3            
##  [49] viridisLite_0.3.0      xtable_1.8-4          
##  [51] progress_1.2.2         htmlTable_1.13.3      
##  [53] gridGraphics_0.5-0     flashClust_1.01-2     
##  [55] foreign_0.8-75         bit_1.1-15.2          
##  [57] preprocessCore_1.48.0  Formula_1.2-3         
##  [59] rsvg_1.3               htmlwidgets_1.5.1     
##  [61] httr_1.4.1             gplots_3.0.1.2        
##  [63] RColorBrewer_1.1-2     ellipsis_0.3.0        
##  [65] acepack_1.4.1          farver_2.0.3          
##  [67] pkgconfig_2.0.3        XML_3.99-0.3          
##  [69] nnet_7.3-12            locfit_1.5-9.1        
##  [71] dynamicTreeCut_1.63-1  labeling_0.3          
##  [73] ggplotify_0.0.5        tidyselect_1.0.0      
##  [75] rlang_0.4.5            later_1.0.0           
##  [77] AnnotationDbi_1.48.0   visNetwork_2.0.9      
##  [79] munsell_0.5.0          tools_3.6.2           
##  [81] RSQLite_2.2.0          fastmap_1.0.1         
##  [83] evaluate_0.14          stringr_1.4.0         
##  [85] ggdendro_0.1-20        yaml_2.2.1            
##  [87] bit64_0.9-7            robustbase_0.93-5     
##  [89] caTools_1.18.0         purrr_0.3.3           
##  [91] mime_0.9               compiler_3.6.2        
##  [93] rstudioapi_0.11        curl_4.3              
##  [95] plotly_4.9.2           png_0.1-7             
##  [97] tibble_2.1.3           geneplotter_1.64.0    
##  [99] pcaPP_1.9-73           stringi_1.4.6         
## [101] highr_0.8              lattice_0.20-40       
## [103] Matrix_1.2-18          vctrs_0.2.3           
## [105] pillar_1.4.3           lifecycle_0.1.0       
## [107] BiocManager_1.30.10    data.table_1.12.8     
## [109] bitops_1.0-6           grImport_0.9-3        
## [111] httpuv_1.5.2           R6_2.4.1              
## [113] latticeExtra_0.6-29    bookdown_0.18         
## [115] promises_1.1.0         KernSmooth_2.23-16    
## [117] gridExtra_2.3          codetools_0.2-16      
## [119] MASS_7.3-51.5          gtools_3.8.1          
## [121] assertthat_0.2.1       DESeq2_1.26.0         
## [123] GenomeInfoDbData_1.2.2 hms_0.5.3             
## [125] grid_3.6.2             rpart_4.1-15          
## [127] tidyr_1.0.2            rmarkdown_2.1         
## [129] rvcheck_0.1.8          shiny_1.4.0           
## [131] WGCNA_1.68             base64enc_0.1-3

8 Funding

This project has been funded by NSF awards: PGRP-1546879, PGRP-1810468, PGRP-1936492.

References

Cardoso-Moreira, Margarida, Jean Halbert, Delphine Valloton, Britta Velten, Chunyan Chen, Yi Shao, Angélica Liechti, et al. 2019. “Gene Expression Across Mammalian Organ Development.” Nature 571 (7766): 505–9.

Chang, Winston, and Barbara Borges Ribeiro. 2018. Shinydashboard: Create Dashboards with ’Shiny’. https://CRAN.R-project.org/package=shinydashboard.

Chang, Winston, Joe Cheng, JJ Allaire, Yihui Xie, and Jonathan McPherson. n.d. Shiny: Web Application Framework for R. https://CRAN.R-project.org/package=shiny.

Davis, Sean, and Paul Meltzer. 2007. “GEOquery: A Bridge Between the Gene Expression Omnibus (GEO) and BioConductor.” Bioinformatics 14: 1846–7.

Gatto, Laurent. 2019. Rols: An R Interface to the Ontology Lookup Service. http://lgatto.github.com/rols/.

Gautier, Laurent, Leslie Cope, Benjamin M. Bolstad, and Rafael A. Irizarry. 2004. “Affy—analysis of Affymetrix GeneChip Data at the Probe Level.” Bioinformatics 20 (3). Oxford, UK: Oxford University Press: 307–15. doi:10.1093/bioinformatics/btg405.

Keays, Maria. 2019. ExpressionAtlas: Download Datasets from EMBL-EBI Expression Atlas.

Langfelder, Peter, and Steve Horvath. 2008. “WGCNA: An R Package for Weighted Correlation Network Analysis.” BMC Bioinformatics 9 (December): 559.

———. 2012. “Fast R Functions for Robust Correlations and Hierarchical Clustering.” Journal of Statistical Software 46 (11): 1–17. http://www.jstatsoft.org/v46/i11/.

Love, Michael I., Wolfgang Huber, and Simon Anders. 2014. “Moderated Estimation of Fold Change and Dispersion for RNA-Seq Data with DESeq2.” Genome Biology 15 (12): 550. doi:10.1186/s13059-014-0550-8.

Maag, Jesper L V. 2018. “Gganatogram: An R Package for Modular Visualisation of Anatograms and Tissues Based on Ggplot2.” F1000Res. 7 (September): 1576.

McCarthy, Davis J., Chen, Yunshun, Smyth, and Gordon K. 2012. “Differential Expression Analysis of Multifactor RNA-Seq Experiments with Respect to Biological Variation.” Nucleic Acids Research 40 (10): 4288–97.

Merkin, Jason, Caitlin Russell, Ping Chen, and Christopher B Burge. 2012. “Evolutionary Dynamics of Gene and Isoform Regulation in Mammalian Tissues.” Science 338 (6114): 1593–9.

Morgan, Martin, Valerie Obenchain, Jim Hester, and Hervé Pagès. 2018. SummarizedExperiment: SummarizedExperiment Container.

Muschelli, John, Elizabeth Sweeney, and Ciprian Crainiceanu. 2014. “BrainR: Interactive 3 and 4D Images of High Resolution Neuroimage Data.” R J. 6 (1): 41–48.

Mustroph, Angelika, M Eugenia Zanetti, Charles J H Jang, Hans E Holtan, Peter P Repetti, David W Galbraith, Thomas Girke, and Julia Bailey-Serres. 2009. “Profiling Translatomes of Discrete Cell Populations Resolves Altered Cellular Priorities During Hypoxia in Arabidopsis.” Proc Natl Acad Sci U S A 106 (44): 18843–8.

Papatheodorou, Irene, Nuno A Fonseca, Maria Keays, Y Amy Tang, Elisabet Barrera, Wojciech Bazant, Melissa Burke, et al. 2018. “Expression Atlas: gene and protein expression across multiple studies and organisms.” Nucleic Acids Res. 46 (D1): D246–D251. doi:10.1093/nar/gkx1158.

Prudencio, Mercedes, Veronique V Belzil, Ranjan Batra, Christian A Ross, Tania F Gendron, Luc J Pregent, Melissa E Murray, et al. 2015. “Distinct Brain Transcriptome Profiles in C9orf72-Associated and Sporadic ALS.” Nat. Neurosci. 18 (8): 1175–82.

Waese, Jamie, Jim Fan, Asher Pasha, Hans Yu, Geoffrey Fucile, Ruian Shi, Matthew Cumming, et al. 2017. “EPlant: Visualizing and Exploring Multiple Levels of Data for Hypothesis Generation in Plant Biology.” Plant Cell 29 (8): 1806–21.

Winter, Debbie, Ben Vinegar, Hardeep Nahal, Ron Ammar, Greg V Wilson, and Nicholas J Provart. 2007. “An ‘Electronic Fluorescent Pictograph’ Browser for Exploring and Analyzing Large-Scale Biological Data Sets.” PLoS One 2 (8): e718.